Online Discriminative Spam Filter Training

نویسندگان

  • Joshua Goodman
  • Wen-tau Yih
چکیده

We describe a very simple technique for discriminatively training a spam filter. Our results on the TREC Enron spam corpus would have been the best for the Ham at .1% measure, and second best by the 1-ROCA measure. For the Mr. X corpus, our 1-ROCAmeasure was a close second best, and third best by the Ham at .1% measure. We use a very simple feature extractor (all words in the subject and headers). Our learning algorithm is also very simple: gradient descent of a logistic regression model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discriminative Topic Mining for Social Spam Detection

In the era of Social Web, there has been an explosive growth of user-contributed comments posted to various online social media. However, increasingly more misleading and deceptive user comments found at online social media have also been a great concern for consumers and merchants, and social spam have been brought to the attention by the legal circle in recent years. Social spam can cause tre...

متن کامل

Online Active Learning Methods for Fast Label-Efficient Spam Filtering

Active learning methods seek to reduce the number of labeled examples needed to train an effective classifier, and have natural appeal in spam filtering applications where trustworthy labels for messages may be costly to acquire. Past investigations of active learning in spam filtering have focused on the pool-based scenario, where there is assumed to be a large, unlabeled data set and the goal...

متن کامل

SVM-Based Spam Filter with Active and Online Learning

A realistic classification model for spam filtering should not only take account of the fact that spam evolves over time, but also that labeling a large number of examples for initial training can be expensive in terms of both time and money. This paper address the problem of separating legitimate emails from unsolicited ones with active and online learning algorithm, using a Support Vector Mac...

متن کامل

Discriminative estimation of subspace precision and mean (SPAM) models

The SPAM model was recently proposed as a very general method for modeling Gaussians with constrained means and covariances. It has been shown to yield significant error rate improvements over other methods of constraining covariances such as diagonal, semi-tied covariances, and extended maximum likelihood linear transformations. In this paper we address the problem of discriminative estimation...

متن کامل

Not So Naive Online Bayesian Spam Filter

Spam filtering, as a key problem in electronic communication, has drawn significant attention due to increasingly huge amounts of junk email on the Internet. Content-based filtering is one reliable method in combating with spammers changing tactics. Naı̈ve Bayes (NB) is one of the earliest content-based machine learning methods both in theory and practice in combating with spammers, which is eas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006